webui: Add a "Continue" Action for Assistant Message #16971

allozaur · 2025-11-03T15:14:07Z

Close #16097

Add Continue and Save features for chat messages

What's new

Continue button for assistant messages

Click the arrow button on any assistant response to keep generating from where it left off
Useful for getting longer outputs or continuing after you've edited a response
New content gets appended to the existing message

Save button when editing user messages

Now you get three options when editing: Cancel, Save, and Send
Save keeps your edit without regenerating the response (preserves the conversation below)
Send saves and regenerates like before
Useful when you just want to fix a typo without losing the assistant's response

Technical notes

Added continueAssistantMessage() and editUserMessagePreserveResponses() methods to ChatStore
Continue feature sends a synthetic "continue" prompt to the API (not saved to the database)
Assistant message edits now preserve trailing whitespace for proper continuation
Follows existing component architecture patterns

Demos

`ggml-org/gpt-oss-20b-GGUF`

demo1.mp4

`unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF`

demo2.mp4

allozaur

@ngxson @ggerganov lemme know if you think that this logic works for handling the "Continue" action for assistant messages.

@artyfacialintelagent fell free to test this out and give feedback!

ggerganov · 2025-11-03T15:33:25Z

Is this supposed to work correctly when pressing Continue after stopping a response while it is generating? I am testing with gpt-oss and after Continue the text does not seem to resume as expected.

allozaur · 2025-11-03T15:51:44Z

Is this supposed to work correctly when pressing Continue after stopping a response while it is generating? I am testing with gpt-oss and after Continue the text does not seem to resume as expected.

I've tested it for the edited assistant responses so far. I will take a close look at the stopped generation -> continue flow as well

Iq1pl · 2025-11-05T05:29:39Z

Is this supposed to work correctly when pressing Continue after stopping a response while it is generating? I am testing with gpt-oss and after Continue the text does not seem to resume as expected.

When using gpt-oss in Lm Studio the model generates a new response instead of continuing the previous text, this is because of the Harmony parser, uninstalling it resolves this and the model continues the generation successfully.

allozaur · 2025-11-12T17:02:04Z

@ggerganov please check the demos i've attached to the PR description and also test this feature on your end. looking forward to your feedback!

tools/server/webui/src/lib/stores/chat.svelte.ts

ggerganov · 2025-11-13T09:45:56Z

Continue feature sends a synthetic "continue" prompt to the API (not saved to the database)

Hm, I wonder why do it like this. We already have support on the server to continue the assistant message if it is the last one in the request #13174:

https://github.com/ggml-org/llama.cpp/blob/c7e23c79cf4316c385cee0a9d53b688b0d66686f/tools/server/utils.hpp#L729-L751

The current approach often does not continue properly, as can be seen in the sample videos:

Using the assistant prefill functionality above would make this work correctly in all cases.

ngxson · 2025-11-13T10:20:39Z

Agree with @ggerganov , it's better to use the prefill assistant message from #13174

Just one thing to note though, I think most templates does not support formatting the reasoning content back to original, so probably that's the only case where it will break

allozaur · 2025-11-13T10:21:26Z

Thanks guys, I missed that! Will patch it and come back to you.

allozaur · 2025-11-13T19:28:44Z

@ggerganov @ngxson

I've updated the logic with 859e496 and i have tested with few models and only 1 (Qwen3-VL-32B-Instruct-GGUF) managed to properly continue the assistant message in response to the prefill request. See videos below.

`Qwen3-VL-32B-Instruct-GGUF`

Qwen3-VL-32B-Instruct-GGUF.mov

`ggml-org/gpt-oss-20b-gguf`

gpt-oss-20b-gguf.mov

`ggml-org/gpt-oss-120b-gguf`

gpt-oss-120b-gguf.mov

`unsloth/gemma3-12b-it-gguf`

demo.mp4

ggerganov · 2025-11-13T19:59:10Z

For me, both Qwen3 and Gemma3 are able to complete successfully. For example, here is Gemma3 12B IT:

webui-continue-0.mp4

It's strange that it didn't work for you.

Regarding gpt-oss - I think that "Continue" has to also send the reasoning in this case. Currently, it is discarded and I think it confuses the model.

allozaur · 2025-11-13T20:00:28Z

Regarding gpt-oss - I think that "Continue" has to also send the reasoning in this case. Currently, it is discarded and I think it confuses the model.

Should we then address the thinking models differently for now, at least from the WebUI perspective?

It's strange that it didn't work for you.

I will do some more testing with other instruct models and make sure all is working right.

ngxson · 2025-11-13T20:03:28Z

It's likely due to chat template, I suspect some chat templates (especially jinja) adds the generation prompt. Can you verify how the chat template looks like with POST /apply-template endpoint? (the request body is the same as /chat/completions)

ggerganov · 2025-11-13T20:04:55Z

Regarding gpt-oss - I think that "Continue" has to also send the reasoning in this case. Currently, it is discarded and I think it confuses the model.

Should we then address the thinking models differently for now, at least from the WebUI perspective?

If it's not too complicated, I'd say change the logic so that "Continue" includes the reasoning of the last assistant message for all reasoning models.

ngxson · 2025-11-13T20:33:39Z

If it's not too complicated, I'd say change the logic so that "Continue" includes the reasoning of the last assistant message for all reasoning models.

The main issue is that some chat templates actively suppress the reasoning content from assistant messages, so I'm doubt if it will work cross all model.

Actually I'm thinking about a more generic approach, we can implement a feature in the backend such that both the "raw" generated text (i.e. with <think>, <reasoning>, etc) can be sent along with the already-parsed version.

I would say for now, we can put a warning in the webui to tell user that this feature is experimental and doesn't work cross all models. We can improve it later if it gets more usage.

ngxson · 2025-11-17T20:33:03Z

the response can not be continued

yeah a message like that can also be a good solution

…g the conversation payload ending with assistant message

…type

allozaur · 2025-11-18T18:30:33Z

I think it should behave like nothing is added, without any error message. IIRC LM Studio has the same behavior.

@ngxson finally decided to go with this, check e68b416 + 9f653dc

ngxson

Thanks, it works as expected now

* feat: Add "Continue" action for assistant messages * feat: Continuation logic & prompt improvements * chore: update webui build output * feat: Improve logic for continuing the assistant message * chore: update webui build output * chore: Linting * chore: update webui build output * fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message * chore: update webui build output * feat: Enable "Continue" button based on config & non-reasoning model type * chore: update webui build output * chore: Update packages with `npm audit fix` * fix: Remove redundant error * chore: update webui build output * chore: Update `.gitignore` * fix: Add missing change * feat: Add auto-resizing for Edit Assistant/User Message textareas * chore: update webui build output

ServeurpersoCom · 2025-11-20T19:13:44Z

Tested during agentic loop also :)

Stop-And-Start-With-MCP.mp4

webui: add system message in export conversation, support upload conversation with system message Webui: show upload only when in new conversation Webui: Add model name webui: increase height of chat message window when clicking editing Webui: autoclose settings dialog dropdown and maximze screen width when zoom in webui: fix date issues and add more dates webui: change error to toast.error. server: add n_past and slot_id in props_simple webui: add cache tokens, context and prompt speed in chat webui: modernize ui webui: change welcome message webui: change speed display webui: change run python icon webui: add config to use server defaults for sampler webui: put speed on left and context on right webui: recognize AsciiDoc files as valid text files (ggml-org#16850) * webui: recognize AsciiDoc files as valid text files * webui: add an updated static webui build * webui: add the updated dependency list * webui: re-add an updated static webui build Add a setting to display message generation statistics (ggml-org#16901) * feat: Add setting to display message generation statistics * chore: build static webui output webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (ggml-org#16757) * webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe dialog Extended MarkdownContent to flag previewable code languages, add a preview button alongside copy controls, manage preview dialog state, and share styling for the new button group Introduced CodePreviewDialog.svelte, a sandboxed iframe modal for rendering HTML/JS previews with consistent dialog controls * webui: fullscreen HTML preview dialog using bits-ui * Update tools/server/webui/src/lib/components/app/misc/CodePreviewDialog.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/misc/MarkdownContent.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: pedantic style tweak for CodePreviewDialog close button * webui: remove overengineered preview language logic * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> webui: auto-refresh /props on inference start to resync model metadata (ggml-org#16784) * webui: auto-refresh /props on inference start to resync model metadata - Add no-cache headers to /props and /slots - Throttle slot checks to 30s - Prevent concurrent fetches with promise guard - Trigger refresh from chat streaming for legacy and ModelSelector - Show dynamic serverWarning when using cached data * fix: restore proper legacy behavior in webui by using unified /props refresh Updated assistant message bubbles to show each message's stored model when available, falling back to the current server model only when the per-message value is missing When the model selector is disabled, now fetches /props and prioritizes that model name over chunk metadata, then persists it with the streamed message so legacy mode properly reflects the backend configuration * fix: detect first valid SSE chunk and refresh server props once * fix: removed the slots availability throttle constant and state * webui: purge ai-generated cruft * chore: update webui static build feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508) * webui : Revised LaTeX formula recognition * webui : Further examples containg amounts * webui : vitest for maskInlineLaTeX * webui: Moved preprocessLaTeX to lib/utils * webui: LaTeX in table-cells * chore: update webui build output (use theirs) * webui: backslash in LaTeX-preprocessing * chore: update webui build output * webui: look-behind backslash-check * chore: update webui build output * Apply suggestions from code review Code maintenance (variable names, code formatting, string handling) Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: Moved constants to lib/constants. * webui: package woff2 inside base64 data * webui: LaTeX-line-break in display formula * chore: update webui build output * webui: Bugfix (font embedding) * webui: Bugfix (font embedding) * webui: vite embeds assets * webui: don't suppress 404 (fonts) * refactor: KaTeX integration with SCSS Moves KaTeX styling to SCSS for better customization and font embedding. This change includes: - Adding `sass` as a dev dependency. - Introducing a custom SCSS file to override KaTeX variables and disable TTF/WOFF fonts, relying solely on WOFF2 for embedding. - Adjusting the Vite configuration to resolve `katex-fonts` alias and inject SCSS variables. * fix: LaTeX processing within blockquotes * webui: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> server : add props.model_alias (ggml-org#16943) * server : add props.model_alias webui: fix keyboard shortcuts for new chat & edit chat title (ggml-org#17007) Better UX for handling multiple attachments in WebUI (ggml-org#17246) webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (ggml-org#16618) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> webui: Fix clickability around chat processing statistics UI (ggml-org#17278) * fix: Better pointer events handling in chat processing info elements * chore: update webui build output Fix merge error webui: Add a "Continue" Action for Assistant Message (ggml-org#16971) * feat: Add "Continue" action for assistant messages * feat: Continuation logic & prompt improvements * chore: update webui build output * feat: Improve logic for continuing the assistant message * chore: update webui build output * chore: Linting * chore: update webui build output * fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message * chore: update webui build output * feat: Enable "Continue" button based on config & non-reasoning model type * chore: update webui build output * chore: Update packages with `npm audit fix` * fix: Remove redundant error * chore: update webui build output * chore: Update `.gitignore` * fix: Add missing change * feat: Add auto-resizing for Edit Assistant/User Message textareas * chore: update webui build output Improved file naming & structure for UI components (ggml-org#17405) * refactor: Component iles naming & structure * chore: update webui build output * refactor: Dialog titles + components namig * chore: update webui build output * refactor: Imports * chore: update webui build output webui: hide border of button webui: update webui: update webui: update add vision webui: minor settings reorganization and add disable autoscroll option (ggml-org#17452) * webui: added a dedicated 'Display' settings section that groups visualization options * webui: added a Display setting to toggle automatic chat scrolling * chore: update webui build output Co-authored-by: firecoperana <firecoperana>

* feat: Add "Continue" action for assistant messages * feat: Continuation logic & prompt improvements * chore: update webui build output * feat: Improve logic for continuing the assistant message * chore: update webui build output * chore: Linting * chore: update webui build output * fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message * chore: update webui build output * feat: Enable "Continue" button based on config & non-reasoning model type * chore: update webui build output * chore: Update packages with `npm audit fix` * fix: Remove redundant error * chore: update webui build output * chore: Update `.gitignore` * fix: Add missing change * feat: Add auto-resizing for Edit Assistant/User Message textareas * chore: update webui build output

allozaur requested review from ggerganov and ngxson November 3, 2025 15:14

allozaur commented Nov 3, 2025

View reviewed changes

github-actions bot added examples server labels Nov 3, 2025

allozaur force-pushed the 16097-continue-response branch 2 times, most recently from f4c3aeb to b8e4bb4 Compare November 12, 2025 17:00

allozaur added enhancement New feature or request server/webui and removed examples server labels Nov 12, 2025

github-actions bot added examples server labels Nov 12, 2025

allozaur force-pushed the 16097-continue-response branch from b8e4bb4 to e0d03e2 Compare November 12, 2025 17:07

allozaur commented Nov 12, 2025

View reviewed changes

DajanaV mentioned this pull request Nov 12, 2025

UPSTREAM PR #16971: webui: Add a "Continue" Action for Assistant Message auroralabs-loci/llama.cpp#185

Closed

allozaur force-pushed the 16097-continue-response branch from 4741f81 to c7e23c7 Compare November 12, 2025 18:02

allozaur added 15 commits November 18, 2025 19:25

feat: Add "Continue" action for assistant messages

38204ff

feat: Continuation logic & prompt improvements

edb96c5

chore: update webui build output

f151e14

feat: Improve logic for continuing the assistant message

fad6c1f

chore: update webui build output

336004e

chore: Linting

fc9b726

chore: update webui build output

965ee18

fix: Remove synthetic prompt logic, use the prefill feature by sendin…

af12cf6

…g the conversation payload ending with assistant message

chore: update webui build output

541c4cc

feat: Enable "Continue" button based on config & non-reasoning model …

147b6b6

…type

chore: update webui build output

9b960c9

chore: Update packages with npm audit fix

d24a7d0

fix: Remove redundant error

e68b416

chore: update webui build output

7487292

chore: Update .gitignore

8288ca7

allozaur force-pushed the 16097-continue-response branch from d8f952d to 8288ca7 Compare November 18, 2025 18:29

allozaur added 3 commits November 18, 2025 20:30

fix: Add missing change

dd58f81

feat: Add auto-resizing for Edit Assistant/User Message textareas

9f653dc

chore: update webui build output

d1bf855

ngxson approved these changes Nov 19, 2025

View reviewed changes

allozaur merged commit 99c53d6 into ggml-org:master Nov 19, 2025
20 of 24 checks passed

ScarletEmerald mentioned this pull request Nov 22, 2025

Misc. bug: "Continue" action does not work correctly with grammars #17435

Closed

allozaur deleted the 16097-continue-response branch December 5, 2025 22:04

allozaur mentioned this pull request Dec 29, 2025

Feature Request: Webui: Add Continue Generation #17165

Closed

4 tasks

webui: Add a "Continue" Action for Assistant Message #16971

webui: Add a "Continue" Action for Assistant Message #16971

Uh oh!

Conversation

allozaur commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Continue and Save features for chat messages

What's new

Technical notes

Demos

ggml-org/gpt-oss-20b-GGUF

unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Nov 3, 2025

Uh oh!

allozaur commented Nov 3, 2025

Uh oh!

Iq1pl commented Nov 5, 2025

Uh oh!

allozaur commented Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Nov 13, 2025

Uh oh!

ngxson commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Nov 13, 2025

Uh oh!

allozaur commented Nov 13, 2025

Qwen3-VL-32B-Instruct-GGUF

ggml-org/gpt-oss-20b-gguf

ggml-org/gpt-oss-120b-gguf

unsloth/gemma3-12b-it-gguf

Uh oh!

ggerganov commented Nov 13, 2025

Uh oh!

allozaur commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 13, 2025

Uh oh!

ggerganov commented Nov 13, 2025

Uh oh!

ngxson commented Nov 13, 2025

Uh oh!

ngxson commented Nov 17, 2025

Uh oh!

allozaur commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ServeurpersoCom commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

allozaur commented Nov 3, 2025 •

edited

Loading

`ggml-org/gpt-oss-20b-GGUF`

`unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF`

ngxson commented Nov 13, 2025 •

edited

Loading

`Qwen3-VL-32B-Instruct-GGUF`

`ggml-org/gpt-oss-20b-gguf`

`ggml-org/gpt-oss-120b-gguf`

`unsloth/gemma3-12b-it-gguf`

allozaur commented Nov 13, 2025 •

edited

Loading

allozaur commented Nov 18, 2025 •

edited

Loading

ServeurpersoCom commented Nov 20, 2025 •

edited

Loading